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ABSTRACT 

Unlike DNA, in addition to tlie 2'-0H group, uracil 
nucleobase and its modifications play essential 
roles in structure and function diversities of non- 
coding RNAs. Non-canonical U«U base pair is 
ubiquitous in non-coding RNAs, which are highly 
diversified. However, it is not completely clear how 
uracil plays the diversifing roles. To investigate and 
compare the uracil in U-A and U«U base pairs, we 
have decided to probe them with a selenium atom 
by synthesizing the novel 4-Se-uridine (^^U) 
phosphoramidite and Se-nucleobase-modified 
RNAs (^®U-RNAs), where the exo-4-oxygen of 
uracil is replaced by selenium. Our crystal structure 
studies of U-A and U«U pairs reveal that the native 
and Se-derivatized structures are virtually identical, 
and both U-A and U«U pairs can accommodate 
large Se atoms. Our thermostability and crystal 
structure studies indicate that the weakened 
H-bonding in U-A pair may be compensated by the 
base stacking, and that the stacking of the trans- 
Hoogsteen U«U pairs may stabilize RNA duplex 
and its junction. Our result confirms that the 
hydrogen bond (04 H-C5) of the Hoogsteen pair 
is weak. Using the Se atom probe, our Se- 
functionalization studies reveal more insights into 
the U«U interaction and U-participation in structure 
and function diversification of nucleic acids. 

INTRODUCTION 

Unlike natural DNA, which merely stores genetic infor- 
mation in ceUs (1), natural RNA is highly diversified in 
structure and function. Because of the RNA diversity, 
RNA plays essential functions in cells and expands 
complexity of living systems by serving as genetic infor- 
mation carrier, catalyst and regulator (2-10). Recently, 
tremendous functional RNAs have been discovered as 



non-coding RNAs (ncRNA), such as ribozymes, ribo- 
switches, small interfering RNA (siRNA), microRNA 
(miRNA), small nuclear RNA (snRNA) and RNAs 
regulating biological pathways. ncRNAs can control 
gene expressions selectively through transcription and 
translation regulations (11,12), participate in chromatin 
silencing and remodehng (13), regulate the retroviruses 
activity (14), catalyze biochemical reactions (15,16), recog- 
nize metaboHtes (17), as weU as facilitate gene function 
study and drug discovery (18,19). ncRNAs play highly 
specific roles by folding into various 3D structures and 
binding specifically with other molecules or hgands (such 
as proteins and metabolites), which may trigger cascades 
of biological events. 

However, considering the similar chemical structures of 
nuclei acid building blocks (such as almost the same 
nucleobases in RNA and DNA), it is striking that RNA 
with the extra 2'-0H is able to establish much more 
diversified structures and functions than DNA (20,21). In 
addition to the 2'-0H group, it appears that the RNA 
modifications and non-canonical base pairings are the 
two major strategies to overcome the structural homogen- 
eity hmit caused by the four similar nucleobases and to 
achieve huge diversities in both structure and function 
(22-24). Especially, uracil nucleobase can form multiple 
non-canonical base pairings and play essential roles in 
diversifying RNA structure and function. Non-canonical 
U«U base pair is ubiquitous in ncRNA, and Watson-Crick 
U-A pair can often be replaced with U-G wobble pair 
without significant duplex destablization, which increases 
structure and function diversity of ncRNAs. U»U pairs are 
often observed in RNA duplex joinction and loops (25-27), 
whereas U-A pair is normally not formed at these places. 
Replacing U-A pair in duplex with U»U pair significantly 
destablizes the duplex structure. It is not completely clear 
how uracil plays the diversifying roles in these base pairs to 
achieve the structure and function diversity. To investigate 
and compare the uracil roles played in these non-canonical 
and canonical pairs, we have decided to probe the U»U and 
U-A pairs with a Se atom, where the exo-4-oxygen of uracil 
is replaced by selenium. 
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Though 4-Se-uridine was synthesized over three decades 
ago (28,29), it has not been incorporated into RNAs 
because of the synthetic challenges. Recently, our 
successes on the synthesis and biophysical studies of the 
Se-nucleobase modifications (30-35) have encouraged us 
to overcome the ^''U-RNA synthesis challenge, meet the 
urgent needs in ncRNA investigation and probe U-A and 
U»U pairs by a Se atom. Herein, we report the first syn- 
thesis of the 4-Se-uridine phosphoramidite (^"^U) and the 
corresponding ^"^U-RNAs by replacing 4-oxygen with 
selenium. We have found that this Se-modiflcation does 
not cause significant perturbation and that the native and 
modified structures are virtually identical. We also found 
that via the stacking and hydrogen bonding, the uracil 
nucleobase interacts differently in RNA duplex and 
duplex junction. Moreover, the accommodation of the 
larger selenium atom by both U-A and U»U pairs 
implies the RNA flexibihty. Our studies suggest that by 
presenting their different faces and edges, uracil and 
uridine are capable of diversifing structure and function 
of ncRNAs. Furthermore, this Se-modified uridine offers 
the Se-RNAs with additional UV absorption (Imax^ 
370nm; e: 1.30x10^ M"'cm"'). Excitingly, after a 
single-oxygen atom replacement with selenium, we have 
observed for the first time the color RNAs (hght yeUow) 
as well as color RNA crystals (dark yellow). The color 
property of the ^''U-RNAs is unique and has great poten- 
tials in RNA visualization, detection, spectroscopic study 
and crystallography of RNAs and protein-RNA 
complexes and interactions, demonstrating the usefulness 
of selenium-derivatized nucleic acids (SeNA) (36,37) in 
structural biology. In addition, both the anomalous 
phasing and molecular replacement approaches result in 
the identical crystal structures. Our new method provides 
a unique atomic tool for probing structure and function of 
ncRNAs and their protein complexes. 

MATERIALS AND METHODS 

Synthesis of the 4-Se-uri(iine phosphoramidite 

3-(l-((2R,3S,4S,5R )-5-((his(4-methoxyphenyl) (phenyl) 
methoxy ) methyl ) -3- ( tert-hutyldlmethyl-silyloxy )-4- 
hydroxy-tetrahydrofuvan-2-yl ) -2-oxo-l ,2- 
dihydropyrimidin-4-ylselanyl ) propanenitrile 

To a dry THF solution (10 ml) of the starting material 
compound (1, 1.34g, 2mmol), 4,4'-dimethylamino- 
pyridine (24.5 mg, 0.2nimol) and triethylamine (0.56 ml, 
4mmol) under argon, the dry tetraliydrofuran (THF) 
solution (10 ml) of 2,4,6-trisopropylbenzenessulfonyl 
chloride (906 mg, 3.0mmol) was added dropwisely. The 
reaction was stirred for 1 h before it is finished (monitor 
by thin layer chromatography (TLC), 5% methanol in 
dichloromethane). At the same time, the NaBH4 suspension 
(250 mg of NaBH4 in 3 ml of EtOH) was injected into a flask 
containing di(2-cyanoethyl) diselenide [(NCCH2CH2Se)2, 
0.3 ml, d = 1.8g/ml, 2.0mmol] and THF (10 ml) in an ice 
bath with argon. The yellow color of the diselenide dis- 
appeared in ~15 min, giving an almost colorless suspension 
of sodium selenide (NCCH2CH2SeNa). Then, the reacted 
solution of compound 1 was slowly injected into this 



selenide solution. After the selenium incorporation was 
completed in 45 min (monitored on TLC, 5% MeOH in 
CH2CI2, product Rf= 0.60), water (100 ml) was added to 
the reaction flask. The solution was adjusted to pH 7-8 
using CH3COOH (10%) and was then extracted with 
ethyl acetate (3 x 100 ml). The organic phases were 
combined, washed with NaCl (sat., 100 ml), dried over 
MgS04 (s) for 30 min and evaporated to minimum 
volume under reduced pressure. The crude product was 
then dissolved in methylene chloride (5 ml) and purified 
on a silica gel column equihbrated with hexanes/methylene 
chloride (1:1). The column was eluded with a gradient of 
methylene chloride (CH2CI2, 0.5%, 1% and 2% MeOH m 
CH2CI2, 300 ml each). After the collected fraction evapor- 
ation and dry under high vacuum, pure compound 2 was 
obtained as a shghtly yellow foam product (1.27 g, 81% 
yield). 'H-NMR (400 MHz, CDCI3) §: 0.21 (s, 3H, CH3), 
0.38 (s, 3H, CH3), 0.95 (s, 6H, 2x CH3), 2.31-2.37 (m, IH, 
H-2'), 3.00 (dd, / = 6.5 and 6.7 Hz, 2H, CH2-Se), 3.37-3.41 
(m, 2H, CH2-CN), 3.50-3.52 (m, 2H, lH-5'), 3.81 (s, 6H, 2x 
OCH3), 4.17^.22 (m, IH, H-3'), 4.31 (s, IH, 3'-OH), 4.40- 
4.50 (m, IH, H-4'), 5.78 (s, IH, H-1'), 5.90 (d, IH, 
/= 6.8 Hz, H-5), 6.8-6.90 (m, 4H, aromatic), 7.20-7.46 
(m, 9H, aromatic), 8.31 (d, IH, /=6.8Hz, H-6). '^C- 
NMR (lOOMHz, CDCI3) §: -4.30, -4.40 (CH3), 18.1 
(CH2-CN), 19.0 (CH2-CH2-CN), 20.5 [(CH3)2C(t-Bu)], 
25.9 (CH3), 55.3 (OCH3), 68.7 (C-3'), 76.4 (C-2'), 83.1 (C- 
4'),91.0(C-1'), 106.0 (C-5), 1 18.8 (CN), 113.3, 127.1, 128.0, 
128.2, 130.1, 135.0, 135.3, 144.2, 158.7 (Ar-C), 140.4 (C-6), 
153.3 (C-2), 175.0 (C-4). HRMS (ESI-TOF): molecular 
formula, C39H49N307SeSi; [M+H]+: 778.2413 
(calc.778.2426). 

(2R,3S,4S,5R )-2-( (his(4-methoxyphenyl) (phenyl) 
methoxy ) methyl ) -4- ( tevt-hutyldimethylsilyloxy ) -5- 
(4-( 2-cyanoethylselanyl ) -2-oxopyvimidin-l ( 2H)-yl ) - 
tetrahydrofuvan-3-yl-2-cyanoethyl 
diisopropylphosphoramidite 

To the flask (25 ml) containing 2 (453 mg, 0.68 mmol) 
under argon, dry methylene chloride (2.5 ml), A^,A^- 
diisopropylethylamine (0.17ml, 1.03 mmol, 1.5 eq.), and 
2-cyanoethyl A^,A^-diisopropyl-chlorophosphoraniidite 
(195 mg, 0.83 mmol, 1.2 eq.) were added sequentially (3). 
The reaction mixture was stirred at — 10°C in an ice-salt 
bath under argon for 10 min, foUowed by removal of the 
bath. The reaction was completed in 2 h at room tempera- 
ture, generating a mixture of two diastereomers (indicated 
by TLC, 5% MeOH in CH2CI2, product R/ = 0.63 and 
0.68). The reaction was then quenched with NaHC03 
(5 ml, sat.) and stirred for 5 min, followed by the extrac- 
tion with CH2CI2 (3 X 8 ml). The combined organic layer 
was washed with NaCl (10 ml, sat.) and dried over MgS04 
(s) for 30 min, followed by filtration. The solvent was then 
evaporated under reduced pressure, and the crude product 
was re-dissolved in CH2CI2 (2 ml). This solution was 
added drop-wise to cold petroleum ether (or hexane) 
(200 ml) under vigorous stirring, generating a white pre- 
cipitate. The petroleum ether layer was decanted. The 
crude product was re-dissolved again in CH2CI2 (2 ml) 
and then loaded on AI2O3 column (neutral) that was 
equihbrated with CH2Cl2/Hexanes (1:1). The column 
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was eluded with a gradient of methylene chloride and ethyl 
acetate [CH2CI2 to CHjClz/EtOAc (7:3)]. After solvent 
evaporation and dry over high vacuum, the compound 
3 (612 mg) was obtained as a white foamy product (92% 
yield). 'H-NMR (400 MHz, CDCI3, two sets of signals 
from a mixture of two diastereomers) §: 0.2-0.4 (m,12H, 
4XCH3), 0.85-1.20 [m, 36H, 8 x CHj-ipr and 4x 
Si(CH3)], 2.30-2.38 and 2.70-2.82 (2x m, 4H, 2x H-2'), 
2.34 and 2.64 (2x t, / = 6.4 Hz, 4H, 2x O-CH2-CH7-CN), 
3.00-3.04 (m, 4H, 2x Se-CHz-CHz-CN), 3.32-3.44 (m, 
6H, 2x H-5', 2x Se-CH,), 3.52-3.64 (m, 8H, 4x CH- 
ipr, 2x O-CH2-CH2-CN), 3.73-3.84 (m, 2H, 2x H-5'), 
3.82 and 3.83 (2x s, 12H, 4x OCH3), 4.12-^.35 (m, 2H, 
2x H-3'), 4.43-4.48 (ni, 2H, 2x H-4'), 5.70-5.90 (m, 4H, 
2x H-5 and 2x H-1'), 6.83-6.88 (m, 8H, aromatic), 
7.27-7.43 (m, 18H, aromatic), 8.30 and 8.39 (2x s, 2H, 
2x H-6). HRMS (ESl-TOF): molecular formula, 
C48H64N508PSeSi; [M + H]+: 978.3479 (calc. 978.3505). 

Synthesis of the ^"U-RNAs 

All the RNA oligonucleotides were chemically synthesized 
in l.O^mol scale on solid phase. The ultra-mild RNA 
phosphoramidites protected with 2'-TBDMS were used 
(Glen Research). The concentration of the ^''U- 
phosphoramidite was 0.08 M in acetonitrile, compared 
with the regular ones (0.1 M). Coupling was carried out 
using 5-(benzylmercapto)-lH-tetrazole solution (0.25 M) 
in acetonitrile with 12min coupling time for both native 
and Se-modified phosphoramidites. Three percent tri- 
chloroacetic acid in methylene chloride was used for the 
5'-detritylation. Synthesis was performed on control-pore 
glass (CPG-500) immobihzed with the appropriate nucleo- 
side through a succinate linker. AU oligonucleotides were 
prepared in dimethoxy trityl (D]VITr)-on form. After syn- 
thesis, the RNAs were cleaved from the solid support and 
fully deprotected by 0.05 M K2CO3 (methanol solution) 
for 8h at room temperature, followed by neutralization, 
evaporation and the treatment of tetrabutylammonium 
fluoride (TBAF) solution (1 M in THF) for overnight. 
After desalting and HPLC purification, the 5'-DMTr 
group was removed by 3% aqueous solution of trichloro- 
acetic acid, and the solution was neutralized to pH 7.0 
with a freshly made triethylammonium acetate (TEAAc) 
buffer and precipitated with NaCl (final concentration: 
0.3 M before ethanol addition) and ethanol (3 volumes). 
The ethanol suspension was placed at — 80 C for 1 h, 
followed by centrifugation to collect the RNAs. 

HPLC analysis and purification 

The RNA oligonucleotides were analyzed and purified by 
reverse-phase high performance hquid chromatography 
(RP-HPLC) in DMTr-on form. After the TBAF desilyla- 
tion and desalting with sephadex G-25, HPLC purification 
was carried out using a 21.2 x 250 mm Zorbax, RX-C8 
column at a flow rate of 6ml/min. Buffer A consisted of 
10 mM TEAAc (pH 7.1), whereas buffer B contained 50% 
acetonitrile and 10 mM TEAAc (pH 7.1). Similarly, the 
HPLC analysis was performed on a Zorbax SB-C18 
column (4.6 x 250mm) at a flow of l.Oml/min using the 
same buffer system. The DMTr-on ohgonucleotides were 
eluded in a 20-min linear gradient of 100% buffer A to 



100% buffer B. The HPLC analysis for both DMTr-on 
and DMTr-off oligonucleotides were carried out with up 
to 60%) of buffer B in a linear gradient in the same period 
of time. The collected fractions were lyophilized, and 
the purified RNAs were re-dissolved in water for the 
detritylation and precipitation steps. 

Thermodenatuvation of the ^"U-RNAs 
Solutions of the duplex RNAs (1 or 2\iM) were prepared 
by dissolving the purified RNAs in sodium phosphate 
[10 mM (pH 6.5)] buffer containing 100 mM NaCl. The 
solutions were heated to 75° C for 3min, then cooled 
down slowly to room temperature and stored at 4°C over- 
night before Tm measurement. Before thermal denatur- 
ation, the Se-RNA samples were bubbled with argon for 
5min. Each denaturizing curves were acquired at 260 nm 
by heating and cooling from 5 to 70°C for four times in a 
rate of 0.5°C/min, using Cary-300 UV- Visible spectrom- 
eter equipped with temperature controller system. 

Se-RNA crystallization and diffraction data collection 

The purified RNA ohgonucleotides (1 mM) were heated to 
70° C for 2min and cooled down slowly to room tempera- 
ture. Both native buffer and Nucleic Acid Mini Screen Kit 
(Hampton Research) were applied to screen the crystal- 
lization conditions at different temperatures using the 
hanging drop method by vapor diffusion (1 |il of RNA 
and 1 |il of buffer). Thirty percent glycerol, PEG 400 or 
the perfluoropolyether was used as a cryoprotectant 
during the crystal smounting, and data collection was 
taken under the liquid nitrogen stream at 99° K. The Se- 
RNA crystal data were collected at beam fine X12B and 
X12C in NSLS, Brookhaven National Laboratory. A 
number of crystals were screened to find the ones with 
strong anomalous scattering at the K-edge absorption of 
selenium. The distance of the detector to the crystals was 
set to 150 mm. The radiation wavelength at 0.9795 A was 
chosen for diffraction data collection and selenium single- 
wavelength anomalous dispersion (SAD) phasing. The 
crystals were exposed for 10 s per image with 1° oscilla- 
tion, and a total of 180 images were taken for each data 
set. All data were processed using HKL2000 and 
DENZO/SCALEPACK (38). 

Structure determination and refinement 

The structures of Se-RNAs were solved by both SAD with 
HKL2MAP and molecular replacement with Phaser (39), 
foUowed by the refinement with Refmac. Both SAD 
phasing and molecular replacement led to the same 
crystal structure. The refinement protocol includes 
simulated annealing, positional refinement, restrained B- 
factor refinement and bulk solvent correction. The stereo- 
chemical topology and geometrical restrain parameters of 
DNA/RNA (40) have been apphed. The topologies and 
parameters for the uridine modified with selenium (US) 
were constructed and applied. After several cycles of re- 
finement, a number of highly ordered waters were added. 
Finally, the occupancies of selenium were adjusted. Cross- 
validation (41) with a 5-10% test set was monitored 
during the refinement. The a A- weigh ted maps (42) of 
the (2m|Fo| - D|Fc|) and the difference (m|Fo| - D|Fc|) 
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density maps were computed and used throughout the 
model building. 

RESULTS AND DISCUSSION 

Synthesis of the 4-Se-uri(iine (^*U) phosphoramidite 

We have developed a facile strategy to synthesize the Se- 
phosphoramidite. As showed in Scheme 1, our synthesis 
started from the partially protected 2'-TBDMS-5'-trityl- 
uridine (1). To simplify the synthesis, we used a bulky 
reagent (2,4,6-triisopropylbenzenesulfonyl chloride, 
TIBS-Cl) to selectively activate position 4, thus avoiding 
the protection and deprotection steps of the 3'-hydroxyl 
group. Without purifying the activated intermediate, the 
selenium functionahty was introduced by substituting 
TIBS group at position 4 with 2-cyanoethylselenide in 
the yield of 81%. Sodium 2-cyanoethylselenide was 
generated by the reduction of di-(2-cyanoethyl) diselenide 
with NaBH4 in ethanol solution (30). This protected Se- 
functionahty is compatible with the soUd-phase synthesis 
and can be removed by weak base treatment (K2CO3 in 
methanol). Finally, the 4-Se-uridine derivative (2) was 
converted to the corresponding phosphoramidite (3) in 
92% yield. The analysis data are shown in the supporting 
information (Supplementary Figures S1-S7). 

Synthesis of the SeU-RNAs 

The ultramild phosphoramidites, where the base-labile 
protecting groups can be deprotected with a weak base 
(K2CO3 in methanol) (30,32,33,35,43), were used 
because the 4-Se-functionality is sensitive to strong base 
cleavage (such as ammonia, causing deselenization). We 
found that this Se-moditied phosphoramidite is compat- 
ible with the longer coupling time (12min), I2 oxidation 
and trichloroacetic acid treatment without deselenization. 
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Scheme 1. Synthesis of ^"U-phosphoramidite (3) and ^"U-RNAs (4). 
Reagents and conditions: (a) TIBS-Cl, 4.4'-dimethylamino-pyridine, 
CH2CI2, room temperature; (b) (NCCH2CH2Se)2/NaBH4, EtOH; (c) 
2-cyanoethyl N,N-diisopropylcliloro-pliosphoramidite and N,N- 
diisopropylethylamine in CH2CI2; (d) the solid-phase synthesis. TIBS- 
Cl: 2,4,6-(triisopropylbenzene)sulfonyl chloride. 



In the case of RNAs containing multiple guanosine 
residues, phenoxyacetic anhydride (Pac20) instead of 
acetic anhydride was used in the capping step to avoid 
the acetylation of guanosine, which is difficult to remove 
under the mild deprotecting conditions (K2CO3 in 
methanol). All Se-RNAs were synthesized in DMTr-on 
form, followed by cleavage and deprotection with 
0.05 M methanol solution of K2CO3. After the deprotec- 
tion, the solution was carefully neutralized with 1 M HCl 
and evaporated to dryness. Then the 2'-TBDMS groups 
were removed by treating with I M TBAF solution in 
THF at room temperature overnight. After desilylation 
and desalting, a typical HPLC profile of the crude Se- 
RNAs is shown in Supplementary Figure S8, which indi- 
cates a high coupling yield of the Se-uridine 
phosphoramidite (96%), compared with incorporation 
of the non-modified phosphorainidites. After desalting 
with Sephadex-G25 matrix, the pure Se-RNAs were 
obtained by RP-HPLC purification, followed by the 
mild detritylation (44). Several ^"U-RNAs containing 
Watson-Crick U-A and Hoogsteen U»U pairs were 
synthesized, purified and characterized (Table 1 and 
SuppleiTientary Figures SB and S9). Excitingly, we 
observed for the first time that the RNA with the single 
Se-atom substitution is visible and has yellow color. 
UV-vis spectroscopic study indicated the Se-RNA with 
Amax at 260 and 370 nm (Figure 1) resulted from the 
native nucleobases and ^''U, respectively. The color 
RNAs can be used as potential probes for many biochem- 
ical and biomedical applications. We also found that the 
Se-RNA crystals are yellow color, indicating this Se- 
derivatization is especially useful for the crystallization 
screening of RNAs and protein-RNA complexes. The 
color is due to the ease of the electron delocalization on 
the nucleobase after the selenium derivatization, thereby 
red-shifting the spectrum significantly by over lOOnm. 
Furthermore, it is worth mentioning that this Se-function- 
ahty is relatively stable. After heating the Se-RNA at 70°C 
for 8h, no significant decomposition was observed, 
indicated by UV and HPLC analyses (Figures lA and 2). 

Determination of extinction coefficient of (£370 ) 

To determine the extinction coefficient of 4-Se-uridine 
residue (^"^U) by comparing with the native nucleotide, 
we synthesized and purified the ^''UMP and 5'-^''UU-3'. 
Their HPLC profiles are presented in Figure 3. The HPLC 
assistance, which removes and minimizes the interference 
of impurities, allows accurate measurement of the extinc- 
tion coefficients (43). Our experimental results indicate 
that ^"^U residue absorbs at both 260 and 370 nm 
(Figure 3A). The absorption ratio at these two wave- 
lengths is 5.71, calculated on the basis of the HPLC 
peak areas. As the extinction coefficient is proportional 
to the absorption. Equation (1) is deduced. In addition, 
froiTt the HPLC profile (Figure 3B) of 5'-^'=UU-3', the ratio 
between the absorption at 260 nm (contributed by both 
native U and ^'^U) and 370 nm (only by ^'^U) is determined 
as 0.920. Thus, Equation (2) is deduced. As the extinc- 
tion coefficient of native U at 260 nm (e^g ^ 9.66 x 
10''M~'cm~') is known (45), we calculated the extinction 
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Table 1. MALDI-TOF-MS Analysis of '*'=U-RNA 



Entry 



Se-RNAs 



Measured (calcd) ni/z 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 



5'.U-^<^U-CGCG-3' {C56H7,N,o04,P5Se) 
5'.G-^=U-GUACAC-3' (C76Hc,5N3o053P7Se) 
5'-GUG-^'=U-ACAC-3' (CTsH^sNjoOjiPvSe) 
5'-AUGG-^'=U-GCUC-3' (CssHiosNjiOsjPgSe) 
5'-CGCGAA-^'=U-UCGCG-3' (C,,4H|44N4608,P|,Se) 
5'-CGCGA AU-^'U-CGCG-3' (C 1 1 4H ,44N4608 , P , , Se) 
5'.U-^°U-AUAUAU AUAU AA-3' (C , 33H , 62N49O95P , 3Se) 
5'-AA-^°U-A(2'-SeMe-U)AUAUAUAUU-3' (Cn4H,64N49094Pi3Se2) 
5'-GG-^=U-AUUGCGGUACC-3' (Ci33H|65N5,097P|3Se) 
5'-A>U-CACCUCCUUA-3' (C|,iH,4,N38082PiiSe) 
U-^'U-AGCUAGCU (C94H , i7N34069P9Se) 
U-^^U-CGCGAUCGCG (Cii3H|42N43083PiiSe) 
U-^'=U-CAUGUGACC (Cio3Hi29N37076PioSe) 



[M + Hf : 
[M + Hf : 
[M + Hf : 
[M + Hf : 
[M + Hf : 
[M + Hf : 
[M + Hf : 
[M + Hf 
[M + Hf : 
[M + Hf : 
[M + H]+ 
[M + Hf : 
[M + H]+ 



1915.4 
2573.3 
2573.5 
2895.3 
3873.3 
3874.0 
4449.7 
4526.4 
4526.4 
3740.8 
3186.2 
3851.7 
3489.8 
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Figure 1. The UV spectra and color of the ^''U-RNA. (A) red line 
(Vax = 260 and 370 nm): UV spectrum of the ^=U-RNA (5'-G-^'U- 
GUACAC-3') without heating; black broken line (X,t^.^^ = 260 and 
367 nm): UV spectrum of the ^°U-RNA after heating at 70°C for 8 h; 
(B) the ^'=U-RNA (yellow, 1.0 niM) and the corresponding native RNA 
(colorless, 1.0 mM). 

coefficient of ^'^U at 370 nm (e*^) and 260 nm (£260) "^^^ 
13.0 and 2.28 x 10^M"'cm"', respectively. 
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Figure 2. Thermal stability analysis of ^''U-RNA (5'-G-^'=U-GUACAC- 
3'). HPLC profile 1 and 2 (without heating of the Se-RNA) monitored 
at 260 and 370 nm, respectively. HPLC profile 3, 4 and 5 (monitored at 
370 nm) were analysis of the Se-RNA heated at 70°C for 2, 5 and 8h, 
respectively. 



Thermodenaturation study 

The rationales of using a Se atom to probe the U-A and 
U»U base pairs are that selenium, a large-size atom, can 
probably strengthen the stacking interaction and is a 
poorer hydrogen-bond acceptor (30,32,33) that can likely 
weaken the hydrogen-bond (H-bond) interaction. The po- 
larizable and large Se atom with delocalizable electrons 
can increase the stacking interaction by narrowing the 
gap between the stacked nucleobases, which is observed 
in our crystal structure presented in this work. 
Furthermore, the increase of the stacking interaction by 
this Se atomic probe is consistent with the computational 
study of the Se-modified thymidine in DNA duplex (46). 
Thus, the Se-atom probe that alters the stacking and 
H-bonding interactions may provide novel insights into 
the base pairs. To investigate the RNA duplex recognition 
and stabihty, we carried out the UV-melting study with 
RNAs containing the 4-Se-uracil in duplexes or in duplex 
junctions (or overhang regions). Typical curves of 
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Figure 3. Calculation oU'^'-l^ and E^m RP-HPLC analysis. (A) HPLC 
profile of 3'>UMP at 260 nm (solid line) and 370 nm (dash fine). 
(B) HPLC profile of 5'>UU-3' dimer at 260 nm (solid line) and 370 nm 
(dash line). The samples (3'-^'=UMP and 5'>UU-3') were analyzed on a 
Welchrom XB-C18 colmnn (4.6 x 250mm, 5 n) at a flow of l.Oml/min 
and with a linear gradient of 5-50% B in 20 min, with a retention time 
of 10.3 and 13.6min, respectively. Buffer A: lOmM TEAAc (pH 7.1); 
B: 50% acetonitrile in 10 mM TEAAc (pH 7.1). 



Se-RNA melting-temperatures (Tm) are showed in 
Figure 4, and all the Tm data are summarized in 
Table 2, compared with the corresponding native RNA 
duplexes. When the Se-atom probe is introduced to the 
uracil in RNA duplexes, no significant Tm differences 
between the native and Se-modified duplexes were 
observed (entry 1-8 in Table 2), and the free energy 
(AG) differences with the corresponding natives were 
almost zero. This suggests that the Se-atom probe in 
RNA duplex regions may not cause significant perturb- 
ation in duplex stabihty. As selenium is a poor H-bond 
acceptor, it is anticipated that the Se-mediated H-bond in 
the U-A pair is weak. The zero (or very small) free energy 
difference between the native and Se-modified RNA 
duplexes also indicates that the stabihty increase via the 
stronger stacking compensates the stabihty decrease 
caused by the weaker H-bonding. This observation 
reveals that the modified U-A base-pair can maintain a 
fine balance between the stacking and H-bonding 
interactions. 

It is reported that a U»U pair is less stable comparing 
with a U-G or C-A mispair in a RNA duplex (33,47). In 
RNA duplex junctions and loops, however, the two con- 
secutive U»U pairs are more stable than the two consecu- 
tive A-A pairs (48). Thus, the Se-atom probe is used to 
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Figure 4. (A) Normalized Tm curve of Se-RNA (5'-UUA-^°U- 
AUAUAUAUAA-3')2, compared with the corresponding native 
RNA. The Se-RNA (circle line): Tm = 37.3 ± 0.5°C; the native 
(diamond line): Tm = 38.0 ± 0.3°C. (B) Normalized Tm profiles of 
Se-RNA lOmer (5'-rU-*''=U-AGCUAGCU-3')2 and 12mer (S'-U-^'U- 
CGCGAUCGCG-3')2, compared with their corresponding natives. 
The native RNA-lOmer (gray dash-dot line): Tm = 42.2 ± 0.2°C; the 
Se-RNA lOmer (gray solid fine): Tm = 47.1 ± 0.3°C; the native RNA 
12mer (black dash-dot line): Tm = 59.4 ± 0.3°C; the Se-RNA 12mer 
(black solid hue): Tm = 63.2 ± 0.3°C. 



investigate the non-canonical U»U pair, and we chose and 
modified the RNAs forming RNA duplex and UU 
junction (Table 2). The UV-thermal denaturation study 
was carried out, and the melting-temperatures (Tm) of 
the Se-RNAs and their corresponding natives are 
summarized in Table 2 (entry 9-14). Excitingly, when 
the atomic probe is introduced to the RNA duplex junc- 
tions, the melting temperatures increased by 1.5-2.4 C per 
Se-modification of these RNA duplexes. Consistently, the 
free energy (AG) calculation indicates that each Se atom 
contributed additional stabilization (0.4-0.8 kcal/mol) to 
the stabihty of the RNA duplexes. This increased RNA 
duplex stability is attributed to the increased stacking 
interaction of '^U on the duplex ends; the support from 
the high-resolution structure data is presented later. Via 
the Se-atom probe, the UV-melting study of the duplex 
RNAs containing the UU junction indicates that the 
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Table 2. UV-melting temperatures of ^"U-RNAs 
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uracil stacking contributes significantly to RNA duplex 
stability. 

Crystallization, diffraction data collection and crystal 
structure determination 

To investigate the Se-nucleobase modification and its 
structural property, we have crystallized two Se-RNA se- 
quences [hexamer (5'-rU-^''U-CGCG-3')2 with overhangs 
and octamer (5'-rGUG-^'=U-ACAC-3')2 with a perfect 
duplex]. Crystals of both Se-RNA sequences were 
formed in 2-5 days at room temperature (25 C) with the 
Hampton nucleic acid mini-screen kit (total 24 buffers 
with broad conditions). Excitingly, all crystals of both 
Se-RNAs had strong yeUow or dark yellow color 
because of the selenium modification (Figures 5 and 6). 
The Se-RNA hexamer formed crystals in 22 of 24 buffers 
using the kit, whereas the corresponding native RNA 
formed crystals only in 4 of 24 buffers (in 3 weeks) using 
the kit. Most of these Se-RNA crystals (one example 
shown in Figure 5) diffracted very well, up to 1.3 A reso- 
lution (the orthorhombic space group, C222i). Similarly, 
the Se-RNA octamer formed crystals in 22 of 24 buffers 
using the same kit, and these crystals (examples shown in 
Figure 6) could diffract up to 2.5 A resolution (the 
rhombohedral space group, R32). In contrast, the corres- 
ponding native (5'-rGUGUACAC-3')2 did not crystallize 
under any conditions over several weeks, which is consist- 
ent with the literature (49). The native octamer (5'-rGUG 
UACAC-3')2 is difficult to crystalize, and its structure has 
not been reported in hterature. Finally, several high- 
quality crystals from these two Se-RNAs were mounted 
and cryo-protected for the diffraction data collection. The 
structures were determined using the best data sets and 
diffractions collected from the crystals grown in buffer 
No.lO [10% MPD, 40 mM Na Cacodylate (pH 6.0), 
12mM Spermine tetra-HCl, 12mM NaCl and 80 mM 
KCl] for the Se-hexamer and No. 12 [10% MPD, 40 mM 
Na Cacodylate (pH 6.0), 12mM Spermine tetra-HCl, 
80 mM KCl and 20 mM BaCy for the Se-octamer. The 



statistic data of the structural analysis are summarized 
in Table 3, and the determined Se-RNA structures are 
presented in Figures 5 and 6. 

Structures of 4-Se-derivatized RNAs 

The structure of the Se-RNA hexamer (Figure 5) revealed 
formation of the right-handed Watson-Crick duplex 
(Supplementary Table SI) and Hoogsteen base pairs. 
The structures determined via SAD and molecular re- 
placement approaches are identical. The Se-modified 
structure (PDB ID: 3HGA; 1.30 A resolution) and the 
corresponding native structure (PDB ID: lOSU; 1.40 A 
resolution) (50) are virtually identical as well. They can 
superimpose on each other perfectly well (Figure 5C) 
with the RMSD as 0.09 A, indicating the fine structure 
isomorphism. Moreover, the electron delocalization of 
the large Se atom on the uracil may facihtate the 
nucleobase stacking interaction, also supported by the 
computational study of the Se-modified nucleobase (46). 
Furthermore, Se atom is 0.43 A larger than O, and the 
distances between U2 4-exo-Se and the 3'-cytosine atoms 
(N3, exo-N4, C4 and C5) are similar to the corresponding 
native distances between U2 4-exo-O and the 3'-cytosine 
atoms (Figure 5D and E); the distances between the 4-Se 
or 4-0 atom and the 3'-C atoms are also displayed. Thus, 
the comparison of the Se-modified and native structures 
(Figure 5D-I) suggests that the Se-nucleobase may better 
stack on the 3'-cytosine than the native nucleobase. The 
stronger stacking interaction can rigidify the local con- 
formation and strengthen the RNA duplexes, which are 
consistent with the stronger duplex stability in the 
presence of the UU overhang (or duplex junction; 
Table 2). These results are also consistent with the faster 
crystal growth after the selenium modification. Similar to 
the corresponding native structure (50), two ^"^U^U pairs 
(Hoogsteen pair) have been observed in the Se-RNA 
(Figure 5F and G). In the Se-modified and native struc- 
tures, both ^"^U^U and U»U pairs participate in formation 
of a pseudo-fiber and long duplex through the overhang 
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Figure 5. The yellow crystal and structures of the 4-Se-U RNA hexamer, (5'-U- "U-CGCG-S')!- The purple and red balls represent Se and O atoms, 
respectively. (A) The picture of the yellow Se-RNA crystal (0.1 x 0.1 x 0.1 mm). (B) Structure of the Se-RNA duplex containing the Se-RNA hexamer 
(in red), the base-paired CGCG (in green) and the U»U-paired U-^°U (in blue). (C) Superimposition of the Se-modified structure (in red; PDB ID: 
3HGA; 1.30A resolution) and the native structure (in cyan; PDB ID; lOSU; 1.40A resolution), the rmsd value is 0.09 A. (D) Se-modified U2 stacks 
on its 3'-cytosine; the distance between the Se atom and exo-N4 of 3'-cytosine is 3.3 A; the distance between the Se atom and C4 of 3'-cytosine is 
3.5 A. (E) is the top view of (D). (F) Native U2 stacks oil its 3'-cytosine; the distance between the O atom and exo-N4 of 3'-cytosine is 3.3 A; the 
distance between the O atom and C4 of 3'-cytosine is 3.3 A. (G) is the top view of (F). (H) Electron density map (2Fo-Fc) and model of the ^"U^U 
pair at the level of 1.0 o. (I) Superimposition of ^°U»U pair (in red) with native U»U pair (in cyan); the H-bond lengths are indicated individually. 



Hoogsteen-base pairs. The 5'-UU sequence allows the 
RNAs (both the Se-modified and native ones) infinitely 
stacking and elongating along the 2i screw axis in the 
crystals with nicks on the 5'-end of each 5'-U(^'^U). This 
5'-U-S'=U sequence forms the two symmetrical ^''U»U base 
pairs, which is virtually identical to the native U»U pair 



(Figure 5G). Namely, this junction sequence forms the 
two symmetrical ^'^U«U base pairs, which glue the RNA 
duplexes together in a head-to-tail hnear fashion. 

The results of our crystal structure study are consistent 
with the UV-melting study. The 5'-UU of one RNA 
molecule (e.g. the red one in Figure 6A) forms two U«U 
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Figure 6. Schematic diagram and local structures of the native and modified U»U pairs in overhang regions (or duplex junctions). (A) Schematic 
diagram of the RNA duplex with five strands, the nicks, ^''U»U pairs and normal Watson-Crick C-G pairs. (B) Superimposition comparison of 
^°U14»U1 (in red) with native U14»U1 pair (in cyan); the numbers represent the H-bond lengths (A). (C) The stacking of two ^''U»U pairs with the 
distance (3.11 A) between the two neighbor Se atoms in the modified U14 and U2. (D) The stacking of two native U»U pairs with the distance 
(3.29 A) between the two neighbor O atoms in native U14 and U2. The 2Fo-Fc maps of Se-4 and 0-4 are showed. 



Table 3. Diffraction data collection and refinement statistics of the Se-RNA structures 



Structure (PDB ID) 



U-^°U-CGCG (3HGA) 



GUG- ''U-ACAC (4IQS) 



Data collection 
Space group 

Cell dimensions: a.h,c (A) 
a, P, T (°) 

Resolution range, A (last shell) 
Unique reflections 
Completeness% 

I/a(I) 

Redundancy 
Refinement 

Resolution range, A 

Rwork% 
Rfree% 

Number of reflections 
Number of atoms 

Nucleic acid (single) 

Heavy atoms and ion 

Water 
R.m.s. deviations 

Bond length, A 

Bond angle, ° 



Se-Hexamer 
C222i 

30.255, 34.079, 28.931 
90, 90, 90 

50.00-1.30 (1.32-1.30) 

3773 (162) 

95.9 (90.0) 

4.5 (26.1) 

40.5 (1.2) 

11.7 (4.2) 

22.62-1.30 

18.9 

22.5 

3586 

157 
1 Se 
42 

0.005 
0.931 



Se-Oc tamer 
R32 

47.006, 47.006, 354.105 
90, 90, 120 
50.0-2.60 (2.69-2.60) 
8915 (846) 
99.2 (95.8) 
5.3 (35.8) 
35.9 (1.0) 
10.0 (6.1) 

31.73.0-2.60 
19.4 
25.8 
4776 

1002 
6 Se 
0 

0.008 
1.846 



= E|I-(I) /SI 
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pairs with the second RNA molecule (the blue one), 
whereas its consecutive CGCG sequence forms regular 
Watson-Crick base pairs with the third RNA molecule 
(the green one). As showed in Figure 5F, the ^'^U-U pair 
displays a conventional hydrogen bond between 04 of the 
native uracil (Ul) and N3 of the Se-uracil (U14) and an 
unusual C-H' ' Se hydrogen bond between C5 of native U 
and Se4 of Se-U, through the Hoogsteen edge of native U 
and the Watson-Crick edge of Se-U. These interactions 
result in a frara-Hoogsteen U»U pair (Figure 5F). 
Compared with the native structure, the substitution of 
the uridine 4-oxygen with a selenium atom does not 
change the structure significantly (Figure 5C), suggesting 
that the Hoogsteen U»U pair has space available at 
4-position of the Watson-Crick edge. A slight shift 
(0.09 A) on the Se-modified nucleobase is observed 
(Figure 6B). The Hoogsteen C-H Se (or O) hydrogen 
bond (bond length: 3.36 A in the Se case), between C5 of 
native U and Se4 of Se-U (the corresponding native H- 
bond: 3.27 A; Figure 6B), is still retained. Because 
selenium atom (1.16 A in atomic radius) is 0.43 A larger 
than oxygen (0.73 A in atomic radius),^ it is surprising to 
find the nucleobase shift only by 0.09 A to accommodate 
the big selenium atom, confirming that the native 
hydrogen bond (04 ' H-C5) of the Hoogsteen pair is 
weak. Thus, the large Se atom probe indicates that the 
Hoogsteen H-bond is less important in the U»U pairing. 
This also suggests that the ?;YHw-Hoogsteen pair can 
tolerate a larger substitution and that the Hoogsteen 
pair is not rigid, which gives the duplex junction sufficient 
flexibility. Moreover, it is counterintuitive that the 
distance (3.11 A) between these two big neighboring 4-Se 



atoms (Figure 6C) is even smaller (by 0.18 A) than the 
native distance (3.29 A) between these two small O 
atoms (Figure 6D), implying the enhanced stacking inter- 
actions between these two U»U pairs. Using electron-rich 
selenium as the atomic probe, our structural result 
suggests the strong electron delocalization and stacking 
interaction between these two U«U pairs. The structure 
study provides new insights into the Hoogsteen U»U pair 
and the uracil-mediated interactions in ncRNAs. 

The Se-octanier structure (Figure 7), where the two Se 
atoms point to the major groove, reveals formation of the 
^'^U-A pair and the typical right-handed A-form duplex by 
the Se-RNA (Supplementary Table S2). Moreover, we 
have superimposed the structures of ^'^U-A (or ^''U4-A13 
pair) and U2-A15 pair (Figure 7D), as the corresponding 
native structure is not available (from literature or us) for 
direct comparison. This comparison of the base pair struc- 
tures has demonstrated that the Se-modified and native U- 
A pairs are similar. The major difference is the slight shift 
of the ^'^U nucleobase to accommodate the large selenium 
atom, reveahng the flexibihty of RNA duplex structure. 
The distance between ^'^U4 exo-Se4 and A13 exo-N6 is 
3.54 A, which was increased from the original 2.99 A. 
Considering that the atomic size of Se is 0.43 A larger 
than that of O and that a typical H-bond length is 
2.8-3.2A, this distance (3.54A) suggests a weak 
hydrogen bond after the Se-modification. On the other 
hand, the polarizable and large Se atom with delocalizable 
electrons may facihtate the base stacking interaction, sup- 
ported by the narrower base-pair gap and the computa- 
tional study of the Se-nucleobase-modified DNA (46). 
Using the Se atom probe, we found that the increased 




Figure 7. The yellow crystals and structures of the 4-Se-U RNA octamer, (5'-GUG- "U-ACAC-S')?- The Se atoms are labeled as purple balls. 
(A) Crystal image. (B) The Se-RNA duplex structure (PDB ID: 4IQS; 2.75 A resolution). (C) Electron density map (2Fo-Fc) and model of the ^°U-A 
pair at the level of 1.0 a. (D) Superimposition of °U-A pair (in pink) with native U2-A15 pair (in cyan); the H-bond lengths are indicated 
individually. 
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stacking interaction can compensate the loss of the H- 
bond interaction, which is consistent with the virtually 
identical duplex stability after the Se-niodification 
(Table 2). Moreover, most of the 2'-hydroxyl groups are 
involved in the H-bonding interactions with its 3'-sugar 
ring oxygen (04') or 3'-phosphate oxygen, which restrains 
the conformations of the sugar-phosphate backbone, 
thereby facilitating the intramolecular interaction and 
reducing molecular dynamics. The Se-RNA crystallization 
is consistent with the Se-enhanced base stacking and con- 
formation rigidiflcation. In the crystal lattice, the duplexes 
are stacked on the top of each other in a head-to-tail 
fashion and three Se-RNA duplexes present in an asym- 
metric unit, where the three duplexes are virtually identical 
(r.m.s <0.1 A). Chain A and B are showed in Figure 7. 

Furthermore, X-ray crystallography is one of the most 
powerful methodologies for structure and function 
studies of RNAs and their complexes with hgands, 
including protein-RNA complexes and RNA-small 
molecule complexes, at the atomic resolution. However, 
owing to the difficulties in crystallization and phasing 
(phase determination or phase problem), progress in 
RNA crystallography is limited, especially in the ncRNA 
structure study. Inspired by the protein Se-derivatization, 
multi-wavelength anomalous dispersion phasing and SAD 
phasing (51-55), our laboratory has pioneered SeNA 
(36,37), which has great potential as a general strategy 
for RNA X-ray crystallography (37). This research work 
on the synthesis and structure studies of the 4-Se-uridine 
RNAs has further demonstrated that the selenium modi- 
fication is a useful approach for structural biology, as the 
Se-functionalization can facilitate phase determination, 
crystallization, RNA color and atomic probing. 

CONCLUSION 

To probe uracil-mediated interactions and base-pairs 
with a single selenium atom, we have synthesized the 
4-Se-uridine phosphoramidite and Se-RNAs. Our 
thermostability and structure studies indicate that the 
modified and native structures are virtually identical, 
that the H-bonding decrease in U-A pair can be 
compensated by the base-stacking increase, and that the 
uracil stacking in duplex junction may increase duplex 
thermostabiUty. We also found that the stacking inter- 
action of the two ;ra«^-Hoogsteen U»U pairs is the 
main contributor to the duplex junction stabihty, 
whereas the Hoogsteen H-bond is weak. Moreover, the 
accommodation of larger Se atoms in uracil by both U- 
A and U«U pairs implies the RNA flexibility. Using the Se 
atom probe, our studies confirm that uracil is capable of 
interacting in multiple modes, thereby diversifying U«U 
and U-A pairs in structure and function. Our thermo- 
dynamic and structural studies have also demonstrated 
that this Se-modification can facilitate the nucleobase 
stacking interaction and potential crystal growth without 
significant perturbation. Furthermore, this Se-modifica- 
tion generates color RNA for the first time by single 
atom replacement, and it shifts the uridine UV spectrum 
over lOOnm (^''U X^^^: 370nm; e: 1.30 x 10"^ M"'cm"'). 
This color property is useful for RNA-protein 



co-crystallization, RNA visualization, detection and spec- 
troscopic study. This work provides a new strategy for 
crystallization, phasing, structure and function studies of 
ncRNAs and protein-RNA complexes. 
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